Department of Spatial Planning
ggplot2 is one of core packages of tidyverse. It is one of the most elegant and most versatile system for making graphs in R. ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs.
housing <- read_csv("./data/landdata-states.csv")
# create a subset for 1st quarter 2001
hp2001Q1 <- housing |>
filter(Date == 2001.25)
head(housing[1:5]) # view first 5 columns# A tibble: 6 × 5
State region Date Home_Value Structure_Cost
<chr> <chr> <dbl> <dbl> <dbl>
1 AK West 2010. 224952 160599
2 AK West 2010. 225511 160252
3 AK West 2010. 225820 163791
4 AK West 2010 224994 161787
5 AK West 2008 234590 155400
6 AK West 2008. 233714 157458
now we want to map variables to visual aspects: here we map “Land_Value” and “Structure_Cost” to the x- and y-axes.
here we use geom_point() to add a layer with point (dot) elements as the geometric shapes to represent the data.
A plot constructed with ggplot() can have more than one geom. In that case the mappings established in the ggplot() call are plot defaults that can be added to or overridden — this is referred to as aesthetic inheritance. Our plot could use a regression line:
Not all geometric objects are simple shapes; the smooth geom includes a line and a ribbon.
Each geom accepts a particular set of mappings; for example geom_text() accepts a label mapping.
But what if we want to include points and labels? We can use geom_text_repel() to keep labels from overlapping the points and each other.
aes() function.aes() callThis sometimes leads to confusion, as in this example:
Other aesthetics are mapped in the same way as x and y in the previous example.
We can change the binning scheme by passing the binwidth argument to the geom_histogram function
An alternative visualization for distributions of numerical variables is a density plot. A density plot is a smoothed-out version of a histogram and a practical alternative, particularly for continuous data that comes from an underlying smooth distribution.
To visualize the relationship between a numerical and a categorical variable we can use side-by-side box plots. A boxplot is a type of visual shorthand for measures of position (percentiles) that describe a distribution.
Alternatively, we can make density plots with geom_density().
Additionally, we can map species to both color and fill aesthetics and use the alpha aesthetic to add transparency to the filled density curves.
Once you’ve made a plot, you might want to get it out of R by saving it as an image that you can use elsewhere. That’s the job of ggsave(), which will save the plot most recently created to disk:
ggplot(hp2001Q1, aes(x = Home_Value, color = region, fill = region)) +
geom_density(alpha = 0.5)
ggsave(filename = "houses_plot.png")
R for Data Science 2e, Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund